Improving Structured Grid-Based Sparse Matrix-Vector Multiplication and Gauss–Seidel Iteration on GPDSP
نویسندگان
چکیده
Structured grid-based sparse matrix-vector multiplication and Gauss–Seidel iterations are very important kernel functions in scientific engineering computations, both of which memory intensive bandwidth-limited. GPDSP is a general purpose digital signal processor, significant embedded processor that has been introduced into high-performance computing. In this paper, we designed various optimization methods, included blocking method to improve data locality increase access efficiency, multicolor reordering develop fine-grained parallelism, partitioning for structures, double buffering overlap computation on structured SpMV GPDSP. At last, combined the above methods design multicore vectorization algorithm. We tested matrices generated with grids different sizes platform obtained speedups up 41× 47× compared unoptimized iterations, maximum bandwidth efficiencies 72% 81%, respectively. The experiment results show our algorithms could fully utilize external bandwidth. also implemented commonly used mixed precision algorithm 1.60× 1.45×
منابع مشابه
On improving the performance of sparse matrix-vector multiplication
We analyze single-node performance of sparse matrix-vector multiplication by investigating issues of data locality and ne-grained parallelism. We examine the data-locality characteristics of the compressed-sparse-row representation and consider improvements in locality through matrix permutation. Motivated by potential improvements in ne-grained parallelism, we evaluate modiied sparse-matrix re...
متن کاملReconfigurable Sparse Matrix-Vector Multiplication on FPGAs
executing memory-intensive simulations, such as those required for sparse matrix-vector multiplication. This effect is due to the memory bottleneck that is encountered with large arrays that must be stored in dynamic RAM. An FPGA core designed for a target performance that does not unnecessarily exceed the memory imposed bottleneck can be distributed, along with multiple memory interfaces, into...
متن کاملOptimizing Sparse Matrix Vector Multiplication on SMPs
We describe optimizations of sparse matrix-vector multiplication on uniprocessors and SMPs. The optimization techniques include register blocking, cache blocking, and matrix reordering. We focus on optimizations that improve performance on SMPs, in particular, matrix reordering implemented using two diierent graph algorithms. We present a performance study of this algorithmic kernel, showing ho...
متن کاملSparse Matrix-Vector Multiplication on FPGAs
Floating-point Sparse Matrix-Vector Multiplication (SpMXV) is a key computational kernel in scientic and engineering applications. The poor data locality of sparse matrices signicantly reduces the performance of SpMXV on general-purpose processors, which rely heavily on the cache hierarchy to achieve high performance. The abundant hardware resources on current FPGAs provide new opportunities to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied sciences
سال: 2023
ISSN: ['2076-3417']
DOI: https://doi.org/10.3390/app13158952